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Introduction 
Module for ELEC 301 project 


What is Matrix Completion 
In 2006, Netflix issued a million dollar challenge to the world: 


“Ts there a computer algorithm that can accurately predict a user’s 
movie preferences?” 


In the contest, a data matrix was given that contained ratings of thousands 
of movies from thousands of examinees, but it was only 2% completed. 
Contestants for this Netflix Challenge had to complete the matrix and 
provide the optimal algorithms for the task. 


The Netflix Prize was won in 2009, but the ideas and algorithms generated 
to complete matrices remain vast and powerful in real world applications. 
Simply put, the Matrix Completion algorithm can be used for any areas that 
involve using a data matrix. 


From a more scientific perspective, the 2008 paper, Exact Matrix 
Completion via Convex Optimization by Candes and Recht formalized a 
majorization minimization algorithm for matrix completion. Eric Chi’s 
2014 article Getting to the Bottom of Matrix Completion and Nonnegative 
Least Squares with the MM Algorithm provides a more grounded 
framework for the problem and explains the mathematical concepts behind 
matrix completion. 


A visual representation of matrix completion. 


A visual representation of matrix completion. 


When Does Matrix Completion WOrk 


Given a sparse matrix with movies along one axis and users along another, 
the algorithm had to predict how those users would rate movies they have 
not seen. The solution, known as Matrix Completion, provided a good 
estimate of sparse data, provided it satisfied the following: 


1. The matrix must be low rank 
2. The unobserved indices in the matrix must be uniformly distributed 


In terms of the Netflix Problem, the matrix was extremely sparse -- with 
millions of users and movies, less than 2% of the matrix was actually filled. 
The matrix also followed the above assumptions, specifically that there are 
a few “types” of people who watch Netflix (an action movie lover, a rom- 
com fanatic, etc.), making it low rank, and that each user’s reviews are 
spread uniformly throughout the matrix. 


Characterizing the Problem 


Often, in the real world, these idealities are not upheld. It is very rare to find 
a matrix that is both perfectly uniform and low rank. In order to better 
understand matrix completions’ application to the real world, our project 


aimed to stretch the second requirement and better characterize the 
algorithm’s limits. 


Specifically, we decided to focus on the requirement that the unobserved 
indices in the matrix must be uniformly distributed. How uniform do the 
unobserved entries need to be? At what point does matrix completion “stop 
working”? 


Even more importantly, what does a plot of the error look like as a function 
of uniformity? We know that non-uniform data will result in a predicted 
matrix that is very dissimilar to the actual matrix, and we know that 
uniform data will result in a predicted matrix that very similar to the actual 
data, but what happens in between? Does a small amount of non-uniformity 
result in an unusable matrix, or can matrix completion continue to work 
under less than ideal conditions? 


Real World Implications 


While it is important to characterize algorithms to have a better theoretical 
understanding of how and when they work, this research has very salient 
real world applications as well. 


Imagine an old picture with non-uniform noise distributed throughout it -- 
maybe one area of the photo is particularly noisy. If matrix completion can 
work in the conditions described previously, it would be able to reconstruct 
those images. 


Even more importantly, matrix completion is used to predict cancer survival 
rates, among other medical applications. There is no guarantee that this data 
is uniform, but maybe matrix completion can still be trusted in these 
situations despite this limitation. 


Implementation of Matrix Completion 
Summarizes the implementation of matrix completion using Chis paper 


Objective of Matrix Completion 
minimize 5 | Poe (X) — Poe(Z)|3 + AlZile, 


Equation 1 : Objective of the matrix completion 


The objective of matrix completion is to minimize Equation 1 stated above. 
X is the matrix of actual data, and Z is our prediction model. In real world 
situations, such as the Netflix Problem, the actual data X is only partially 
filled. Thus, the term PQc(X) represents the observed indices of the data 
matrix X, and the following term represents the observed indices of our 
model matrix Z. The first half of Equation 1, the Frobenius norm of the 
differences between the observed X and the model Z, would therefore 
signify how closely the model resembles the actual data. 


The second half of Equation 1, the nuclear norm of the model matrix Z, is 
called the regularization term, which is used here to represent the rank of 
the model, which is an appropriate indicator of the simplicity of the model. 
Simple matrices would have smaller quantities and magnitudes of singular 
values. 


However, as the equation shows, there exists a “tradeoff between the 
[simplicity] (rank) of the model and how well the model matches the data.” 
If the model is too simple, it is often not accurate. If the model is perfectly 
accurate, it is often not simple. 


Overview of Majorization Minimization Algorithm 


The first half of Equation 1 still provides a challenge because it only uses 
terms that are observed. The question that can be asked at this point would 
be how can we convert the projection matrix, PQc(X), into a fully 
completed matrix and still provide the same result for the model Z? 


Here we introduce the majorization-minimization (MM) algorithm. The 
following explanation will give an overview of the algorithm applied to 


T(x), 


The first step of the algorithm is majorizing the function f(x). Majorizing 
means finding a good surrogate of the actual function f(x) anchored at a 
point xn. This means that the surrogate function g(x) must have the same 
value with f(x) at xn. g(x) must always be greater than f(x) at any point of 
x. In other words, g(x) must dominate f(x). 


After we find the majorization function g(x), the second step of the 
algorithm is minimization, which means to find the lowest value of g(x). x 
at the lowest value of g(x) would be our next anchor for the majorization of 
the next iteration of MM algorithm. 


In the next part, we will look at how we implement the MM algorithm into 
the matrix completion problem. 


Majorization for Matrix Completion Problem 
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Using the quadratic majorization of the matrix, we can therefore simplify 
the original problem with the surrogate matrix Y written below: 


1 
minimize all¥ — Z|? + AllZl-- 


Minimization for Matrix Completion Problem (Soft Threshold 
Operator and 4 steps) 
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Given the majorization matrix Y, we now implement the minimization 
using the 4 steps shown above, which include building a Y matrix, singular 
value decomposition of Y, soft thresholding of the ranks, and building the 
next model matrix Z. Soft thresholding, is essentially the solution to the 
minimization of the majorization of Equation 1. The soft thresholding 
process is shown below: 


1 
S(s,) = arg min 5 (s — t)? + Ale| 


s—-xX ifs>d4, 
=¢s+xX ifs < —A, and 
0 |s| < A. 


The detailed proof of achieving minimization through the soft thresholding 
process can be found on the articles referenced at the end of the report. 
(“Getting to the Bottom of Matrix Completion”) The relevant code we 
wrote is shown below: 


function Znew ~ mc_Znew(X, obs, unobs, lambda, threshold, 


converge = false; 
{m,n} = size (X); 
znew = warm start; 


while (~converge) 


% Step 1 - Determine Y Matrix 
Zprevious = Znew; 

Y = zeros (m,n); 

Y(obs) = X(obs); 

Y(unobs) = Zprevious (uncobs); 


% Step 2 - Singular Value Decomposition 
[U,D,V] = svd(¥); 


% Step 3 - Determine Dbar vie Soft Thresholiding 
d = diag(D); 

d_s = sign(d).*max(abs(d) - lambda, 0); 

Dbar = diag(d_3s); 


% Step 4 - Construct the next iterate 


Znew = U*Dbar*transpose (V): 


& Step 5 - Check for Convergence 

norm_diff = norm(Zprevious - Znew, ‘fro') / norm(Zprevious, 
abs_norm diff = norm(Zprevious~ Znew, ‘fro')?; 

converge = (morm_diff < threshold) | (abs_norm_diff == 0); 


K-fold Cross Validation 


warm start) 


*fro’)s 


We used the 10-fold cross validation process to find the optimal 
regularization term lambda A. The process includes randomly dividing the 
observed indices into 10 folds. Then, we remove each fold and use the rest 
of the indices to find the model Z. For each fold, therefore, we find the 
mean squared difference of the observed indices between data X and model 
Z. We then average the results of each 10 fold to get a comprehensive idea 
of how successful the lambda was. We iterate this process for thousands of 
lambdas to find the optimal value. The matlab code we wrote is shown 


below: 


function (lambda best,MSE matrix, lambda seq, X hatt] = mc kfold(X, uncobs, obs, k, lambda min, lambda num) 


% create lambda sequence 

X_missing = X; 

X_missing(unobs) = 0; 

(m,n) = size(X): 

8 = svd(X_missing): 

lambda_max = max(s): 

lambda_seq = logspace(logl0(lambda_ min), loglO(lambda_max), lambda num); 


% partition observed data into kfolds 
N = length (obs): 
kfolds = cell(k,1): 
obs_tmp = obs: 
for i=i:k 
af 4 <= mod(N,k) 
gizel = ceil(N/k): 
else 
sizel = floor(N/k): 
end 
kfolds(i,1) = obs_tmp(i:sizel); 
obs_tmp = obs _tmp(sizel+1:end); 
end 


MSE matrix = zeros (length (lambda seq), k): 
warm start = zeros (m,n); 


% for each given labmda 
for i=length (lambda seq) :-1:1 


h disp((‘*Lambda ‘,num2str(i),’ of ',num2str(length(lambda seq)))}) % display the lambda 
% for each of the k folds 
for j=i:k 

‘ Gisp(('Fold ',num2str(}),' of *,num2str(k))) *% display the kfold 


% Concatenate the original unobserved entries with the kfold 
unobs_k = cat(2,unobs, kfolda(3,1)): 
obe_k = setdiff(obs, kfolds(3,1)): 


*% Call matrix completion on the fold 
X_hat = mc _Znew(X, obsa_k, unobs_k, lambda _seq(i), 0.001, warm_start); tthreshold instead of 0.001 originally 


*% Calculate MSE and add it to matrix 
eq_err = (X_hat(kfolde(3,1)) ~ X(kfolda(3,2))).°27 
MSE_matrix(i,3) = mean(sq_err): 


kfolds{i,1} = obs_tmp(l:sizel): 
obs_tmp = obs_tmp(sizel+l:end): 
end 


MSE_matrix = zeros(length (lambda_seq),k): 
warm_start = zeros(m,n); 


% for each given labmda 
for i=length (lambda seq) :-1:1 


Gisp([*Lambda *,num2str(i)," of ',num2str(length(lambda_seq))]) % display the lambda 


% for each of the k folds 
for 3=1:k 


Gisp({'Fold ',num2str(j),* of ',mum2str(k)]) *% display the kfold 


% Concatenate the original unobserved entries with the kfold 
unobs_k = cat(2,unobs, kfolds(j,1)): 
obs_k © setdiff(obs, kfoldsi),1)): 


% Call matrix completion on the fold 
X_hat = mc_Znew(X, obs_k, unobs_k, lambda_seq(i), 0.001, warm_start); %threshold instead of 0.001 originally 


% Calculate MSE and add it to matrix 
sq_err = (X_hat(kfoldsij,1}) - X(kfolds{3,1})).°2: 
MSE_matrix(i,}) = mean(sq_err): 


% Makes it run faster 
* warm_start = X_hat; 


end 
end 


% take averages and std 
mse_mean = mean(MSE_matrix,2); % mean across rows 
% mse_sd =~ std(MSE_matrix,2); % standard deviation across rows 


% plot CV MSE curve here!'! 
% plot (lambda _seq,mse_mean) 


* return best lambde 

{~, lam_ind) = min(mse_mean); 

lambda_best = lambda_seq(lam_ind); 

X_hatt = mc_Znew(X, obs, unobs, lambda best, 0.001, warm start): 


Breaking Matrix Completion 
Summary of how we stretched matrix completion to discover its limits. 


The Stress Test 


The main focus of our project was not to simply implement matrix 
completion -- that has been done many times in the past and is proven to 
work. What we wanted to do was put the algorithm through a “stress test” 
to understand its limits. 


To do this, we devised the following plan: 


1. Construct a low rank matrix 

2. Create a model of uniformity 

3. Vary the uniformity on unobserved indices in a matrix 

4. Complete the matrix 

5. Measure mean squared error between the original matrix and the 
completed matrix 

6. Plot mean squared error as a function of uniformity 


Constructing a Low Rank Matrix 


In order to construct a low rank matrix, remove indices, and compare the 
results of matrix completion with the actual matrix, we first created two 
separate matrices of size m x r andr x n with random entries between 0 and 
1, where r in the desired rank. We then multiplied those two matrices 
together, constructing one matrix of size m x n and rank r. 


Modeling Uniformity 


Our second challenge was to generate a model of uniformity that provided a 
simple way to change how uniform the unobserved indices were. We 
decided to use a jointly Gaussian distribution, modifying the covariance to 
control uniformity. The idea is simple: low covariance results in a high 
concentration of indices removed from the center of the matrix, while high 
covariance results in an almost perfectly uniform distribution. 


In the figure below, the matrix on the left shows a matrix with a very 
uniform distribution of unobserved entries, characterized by a jointly 
gaussian distribution with high covariance. The matrix on the right 
demonstrates a data set with a high degree of nonuniformity. The 
probability peaks at the center of the matrix, resulting in unobserved indices 
which are highly concentrated in the middle of the matrix. 


Probability Density 


Fobability Ders ty 


Plotting MSE and Uniformity 


After completing the matrix, we calculated the mean squared error between 
the original matrix and the prediction provided by matrix completion and 
stored it in an array, along with the corresponding uniformity percentage. 
We calculated the uniformity percentage by normalizing each covariance by 
the largest covariance used, providing a scale between 0 and 100%. 


Results 
This is where we talk about the results of our project 


Expected results 


The first observation we made was the obvious -- the mean squared error 
for matrices that had unobserved indices which were completely non- 
uniform was very high, and had a large standard deviation, whereas the 
perfectly uniform matrices had a low mean squared error with a small 
standard deviation. 


These results confirmed that our implementation of the algorithm and the 
code for the other parts of our project worked correctly. 


Accuracy of Matrix Completion vs. Randomness of Sampling 
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Unexpected Results 


The more surprising result of our tests, showed that matrix completion can 
tolerate a high level of non-uniformity. Our graph showed that at, according 
to our model, at about 15% uniformity, matrix completion returns results 
that are comparable to 100% uniformity. 


Even the visual depictions of this kind of uniformity are interesting to 
ponder. At 15%, there is still a clear accumulation of unobserved indices at 
the center of the matrix, yet the algorithm works just as well. 


The implications for this are tremendous, and the results show that 
uniformity isn’t a huge concern when applying matrix completion to real 
world problems. Unless the matrix is completely non-uniform, matrix 
completion will provide valid results. 


Future Work 
This section details our plans for expanding on the project and its results 


Generalizing Non Uniformity 


As stated earlier, we wanted our measure of nonuniformity to be simple and 
easy to control. For the purposes of this project, removing entries based on 
a jointly Gaussian Distribution met our requirements. However, this 
measure of nonuniformity is both unrealistic and overly simplistic for real 
world data sets. With this in mind, future work would involve finding ways 
to generalize our notion of nonuniformity so that our results are more 
accurate with regards to real world data sets. 


Testing with Real World Data 


One of the most obvious applications of matrix completion is image 
reconstruction. The figure below shows how matrix completion can be used 
to reconstruct a noisy image. The original image had a uniform distribution 
of removed indices, so matrix completion was able to successfully recreate 
the image. 


Our results suggest that the algorithm would similarly be able to complete 
the image if the removed indices had a certain degree of non uniformity. 
Further testing would involve looking for any kinds of exceptions to our 
results. Due to the visual nature of these matrices, it would be very easy to 
see when matrix completion succeeds or fails. We could then look for 


patterns to generalize the issues the algorithm takes with varying degrees of 
non uniformity in the missing indices. 


Essentially, this is expanding the whole idea of our stress test to be more 
cognizant of real world issues of nonuniformity in data sets. 


Alternative Approaches 


This whole project has been guided by the principal that the existing 
solution for matrix completion is as good as it will get. At our poster 
presentation, one of the judge’s proposed an interesting idea. Could we (or 
others with better qualifications) improve the solution? Is there a way to fix 
the algorithm so that it works for all levels of uniformity? 


We don't know the answer to this question and probably don't have the 
analytical skills to begin looking for one. However, a simple but potentially 
effective solution quickly comes to mind. We could develop a preprocessing 
algorithm that makes the data uniform before running matrix completion. 
Such an algorithm would need to be fast and generalized to all sorts of data 
sets to truly be a viable solution. 
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Conclusion 
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